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ABSTRACT 


This  report  describes  a  method  for  scheduling  preventive 
maintenance  to  minimise  expected  average  hourly  maintenance  cost 
based  on  a  criterion  of  periodically  observing  deterioration  in  one 
or  more  equipment  performance  characteristics.  The  mathematical 
procedure  requires  expressing  the  deterioration  phenomenon  in  the 
form  of  a  simple  Markov  process.  The  implication  of  this  method  is 
that  a  forecast  of  equipment  failure  is  based  only  on  existing  per¬ 
formance  level  and  is  independent  of  any  history  of  prior  deteriora¬ 
tion  rate.  The  criterion  for  scheduling  preventive  maintenance  is 
expressed  as  a  method  lnvplving  matrix  multiplication  rather  than  as 
a  simple  algebraic  formula  or  a  series  of  curves.  This  was  necessi¬ 
tated  by  the  large  number  of  input  parameters  consisting  of  mainte¬ 
nance  cost  parameters  and  a  matrix  of  probabilities  descriptive  of 
the  deterioration  phenomenon. 

Hypothetical  numerical  examples  established  the  potential  of 
this  method  for  achieving  real  saving  in  maintenance  cost.  The 
method  provides  a  systematic  search  for  "lemon"  equipments,  and,  con¬ 
versely,  protects  against  discarding  those  equipments  which  tend  to 
maintain  high  performance  levels  over  extended  periods  of  time. 

As  an  added  result  of  this  analysis,  th^  algebraic  method  pro¬ 
vides  a  technique  for  collecting  deterioration  data  in  terms  of  dis¬ 
tributions  and  not  Just  averages. 

It  was  apparent  in  the  numerical  work  that  the  underlying  fail¬ 
ure  density  function  is  critical  in  determining  the  amourt  of  saving 
which  can  be  achieved  by  this  method.  The  coefficient  of  variation 
is  especially  critical.  More  theoretical  studies  and  field  data 
collection  in  this  area  are  indicated.  It  Is  Important  to  observe, 
however,  that  the  method  is  distribution  free  since  it  does  not 
depend  on  prior  knowledge  of  the  time  to  failure  density  function. 
However,  the  methods  of  data  collection,  definition  of  states  or  per¬ 
formance  levels,  and  the  selection  of  proper  time  intervals  present 
peculiar  problems  which  require  care  in  the  application  of  this 
method.  These  points  are  discussed  in  some  detail  in  the  Appendix.  . 
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INTRODUCTION 


The  first  ARINC  Research  Corporation  monograph  In  this  series 
presented  a  method  ;for  determining  a  preventive  maintenance  schedule 
based  upon  part  replacement  prior  to  in-service  failure**  The  model 
used  in  that  monograph  related  equipment  operating  time  between  such 
preventive  maintenance  actions  to  the  expected  average  hourly  cost 
of  maintenance.  This  model  was  based  on  the  assumption  that  a  meas¬ 
ure  of  equipment  deterioration  was  not  available;  consequently,  the 
method  is  applicable  in  those  cases  where  deterioration  is  the  pri¬ 
mary  cause  of  failure  but  cannot  be  measured.  For  those  cases  in 
which  it  is  possible  to  monitor  equipment  performance  periodically, 
and  thus  to  measure  the  degree  of  deterioration,  a  different  proce¬ 
dure  for  the  selection  of  a  minimum-cost  preventive  maintenance 
schedule  can  be  used.  The  development  of  such  a  procedure  is  the 
purpose  of  this  second  paper  in  the  series. 


Basic  to  this  paper  are  the  assumptions  that  gradual  deteriora¬ 
tion  is  the  principal  cause  of  failure  in  many  equipments,  and  that 
deterioration  is  reflected  ir.  the  experimental  data  collected  in  an 
equipment,  study.  Measurements  of  equipment  deterioration,  together 
with  associated  failure  probabilities  and  cost  parameters,  are  used 
to  compute  expected  average  hourly  maintenance  costs.  The  "optimum" 
preventive  maintenance  schedule  is  that  in  which  the  expected  average 


*  Welker,  E,  L. ,  Relationship  Between  Equipment  Reliability,  Preven¬ 
tive Maintenance  Policy,  and  Operating  Costs  (Monograph  No,  7), 
Research  Corporation,  Washington,  t>.  C. ,  February  13,  1959 
(Publication  No.  101-9-135)- 
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hourly  cost  of  maintenance  la  at  a  minimum,.  The  answer  Is  expressed 
in  terms  of  the  level  of  equipment  performance  and  the  time  between 
performance  measurements. 

1.1  Type  of  Maintenance  Situation  Described  In  the  Paper 

The  maintenance  situation  modeled  here  is  Illustrated  by  _the 
case  in  which  maintenance  personnel  for  a  fleet  cf  trucks  periodically 
measure  the  depth  of  tire  tread.  As  the  tread  wears  down  there  is  an 
increasing  probability  of  tire  failure  in  a  given  subsequent  time 
period.  If  failure  of  the  tire  in  service  is  associated  with  costs 
above  that  of  the  actual  cost  of  replaclng^che  tire  (e.g.,  a  blown 
tire  would  result  in  lost  man-hours  or  a  wreck)  it  is  desirable  to 
replace  the  tire  at  some  convenient  scheduled  time  prior  to  the  time 
of  this  in-service  x'ailure.  Replacing  the  tire  too  soon  will  increase 
the  operating  cost  through  wasted  tire-miles.  Replacing  the  tire 
after  the  tread  has  become  extremely  worn  will  mean  high  cost  through 
a  high  in-service  failure  rate.  Obviously  there  is  an  "optimum"  depth 
of  tread  which,  from  a  cost  standpoint,  warrants  replacement  of  the 
tire. 


Similarly,  the  ratio  of  signal-plus-noise  to  noise  in  a  radio 
receiver  might  be  a  performance  characteristic  upon  which  a  computa¬ 
tion  of  "optimum"  time  for  repair  could  be  based.  As  the  ratio  of 
slgnal-plU3-noise  to  noise  decreases,  there  is  an  increasing  prob¬ 
ability  that  the  receiver  will  fall  In  operation  within  a  given  sub¬ 
sequent  time  period.  Intuitively,  there  should  be  some  optimum  ratio 
of  signal-plus-noise  to  noise  which,  from  a  cost  standpoint,  warrants 
equipment  repair. 
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Equipment  failures  are  usually  determined  by  either  or  both  of 
two  classes  of  personnel  —  by  maintenance  personnel,  who  evaluate 
the  operational  level  of  the  equipment  through  periodic  tests  or 
measurements;  or  by  operating  personnel,  who  evaluate  the  equipment 
through  observation  of  performance,  the  end  out-put  of  the  equipment 
operation. 


During  periodic  checks  maintenance  personnel  will  remove  an 
equipment  from  service  if  in  their  Judgment  the  performance  level  is 
too  low  to  be  adequate.  ThU3,  the  equipment  may  be  removed  from 
service  even  though  the  operator  does  not  express  dissatisfaction  with 


performance A  On  tjie  other  hand,  the  <&$e. ratoV  may,  reject  the  equipment 
even  if  its  performance  —  an  subsequently  measured  by  the  maintenance 
man  —  Is  fairly  high.  An  equipment  which  is  taken  out  of  service  for 
repair  as  A  result  of  dissatisfaction  on  the  part  of  either  the  main¬ 
tenance  man  or  the  operator  is  defined  as  a  failure  in  this  paper. 
Maintenance  actions  performed  as  a  result  of  such  removal  from  service 
are  to  be  distinguished  from  maintenance  actions  on  equipments  whose 
level  of  performance  is  considered  adequate  by  both  groups  of  person^ 
nel.  The  latter  actions  are  defined  as  preventive  maintenance. 


occurs  --  a  level  below  which  there  is  agreement  that  the  performance 
of  the  set  will  not  be  satisfactory.  It  is  recognized  that  there  in 
no  clear  line  of  demarcation  between  satisfactory  and  unsatisfactory 

o 

performance;  the  level  indicated  by  the  dotted  line  in  Figure.  1  is 
introduced  only  for  reasons  of  exposition.  Possibly  it  is  better 
stated  that  there  exists  a  different  probability  of  failure  —  as 
defined  in  Section  1,3  — -  with  each  performance  level.  This  differ¬ 
ence  in  the  probability  of  failure  is  reflected  in  the  probability 
matrix  to  be  used,  and  is  discussed  in  detail  later. 

The  maintenance  personnel  examine  an  equipment  at  time  t  and 
note  the  measured  level  of  operation.  If  performance  has  a  value 
comparatively  close  to  the  dotted  line,  the  decision  may  be  made  to 
repair.  This  is  particularly  true  If  maintenance  personnel  conclude 
that  there  Is  a  high  probability  of  failure  prior  to  the  next  mainte¬ 
nance  inspection.  The  time  Interval  to  the  next  inspection  Is 
obviously  Involved,  since  the  longer  this  Interval  the  greater  the 
probability  of  failure  In  the  interval.  Thus,  the  longer  the  tine 
interval  to  the  next  Inspection  the  higher  must  be  the  level  of 
performance  at  which  preventive  action  should  take  place.  The  ques¬ 
tion  to  be  answered  is:  What  level  of  performance  with  what  time  In¬ 
terval  between  inspections  will  provide  minimum  operating  cost?  The 

model  developed  in  Section  2  is  offered  as  one  method  by  which  the 

i 

answer  to  this  question  may  be  determined. 
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GENERAL  METHODOLOGY 


2 . ,1  The  Markov  Process 

The  maintenance  situation  described  in  the  previous  section 
implies  a  model  which  can  be  described  in  tenns  of  a  Markov  Process** 
In  order  to  do  this,  it  is  useful  to  think  of  the  equipment  perform¬ 
ance  characteristics  as  discrete  variables  whose  separate  measured 
levels  are  called  "states."  Similarly,  it  is  convenient  to  assume  a 
discrete  time  variable  for  periodic  performance  measurement.  This 
time  Interval  can  be  associated  with  the  concept  of  "trial"  commonly 
encountered  in  discussions  of  Markov  processes.  In  a  Markov  process, 
one  Is  concerned  with  the  probabilities  of  transition  from  one  state 
to  another  In  a  single  trial.  The  analogue  in  the  present  case  Is 
the  probability  of  a  transition  from  one  equipment  performance  level 
to  another  in  the  time  Interval  between  Inspections.  It  Is  now 
necessary  to  describe  these  probabilities  an  mathematical  form,  using 
the  words  "state"  and  "trial"  for  brevity  of  expression  and  to  facil¬ 
itate  reference  to  the  discussions  of  the  Markov  process  in  the  lit¬ 
erature. 

The  essential  element  of  a  Markov  process  Is  a  set  of  conditional 
probabilities,  p^j,  the  probability  that  If  the  equipment  is  known  to 
be  In  state  i  It  will  pass  to  state  J  in  a  single  trial.  These  prob¬ 
abilities  can  be  conveniently  written  as  a  matrix,  called  the 

*  For  a  discussion  of  discrete  Markov  chains,  see  Feller,  W., 

An  Introduction  to  Probability  Theory  and  Its  Applications, 

John  Wiley  &  Sons,  Inc.,  1$50,  Chapters  15  and  lo. 
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transition  matrix  of  the  Markov  proddas*  *  *Fov  a*  cu;jc  In  which 

**  v  * 

there  wore  two  states,  the  transition  matidx  would  appear  as: 


al  aL> 


Pll 

Pl2 

P21 

p^y 

For  clarity,  the  states  are  indicated  in  the  rows  and  columns  as 
and  a2.  The  matrix  value  p-Q  is  the  probability  that  if  state  a-^ 
existed  at  trial  k,  this  state  would  still  exist  in  the  next  trial, 
k  +  1,  The  value  p  is  the  probability  that  if  state  a^  existed  at 
trial  k,  it  would  change  to  state  a2  in  trial  k  +  1.  Thus,  we  have 
a  set  of  conditional  probabilities:  given  that  state  ai  exists,  p-jj 
is  the  probability  of  being  in  state  aj  following  the  immediately 
subsequent  trial.  Since  the  above  transition  matrix  states  only  what 
happens  during  the  transition  should  state  exist,  it  is  necessary 
to  know  initially  what  state  does,  in  fact,  exist. 

The  initial  p r  ^abilities  associated  with  the  existence  of  the 
various  states  are  conveniently  written  as  a  probability  vector  —  a 
vector  with  as  many  columns  as  there  are  possible  states.*  For  the 
two-Farametei*  situation  above  (a-^,  a2  ),  the  probabilities-  of  initially 
being  in  the  two  states  are  written  as  the  vector 

[<*1  , 

where  q-j_  is  the  probability  of  beginning  in  state  a^,  and  q2  is  the 
probability  of  beginning  in  state  a2.  If  an  experiment  always  began 
in  state  a-) ,  then  the  vector  would  be: 

[1.0  0]  . 


*  A  probability  vector  Is  a  one-row  matrix  whose  elements  are  non¬ 
negative  and  total  1.0. 
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If  there  were  an  equal  chance  that  the  experiment  would  start  out  In 
either  of  the  two  classes,  tie  vector  would  be: 

[.5  .53  . 

If  this  vector  Is  multiplied  by  the  transition  matrix.  It  will 
produce  the  probabilities  of  being  in  the  two  states  at  the  end  of 
the  first  period.  The  product  also  will  be  a  probability  vector..  If 
this  second  vector  is  multiplied  by  the  transition  matrix,  it  will, 
in  turn,  produce  the  probable  states  in  the  second  period,  and  so  on. 
This  procedure  is  referred  to  as  "chaining." 

A  matrix  of  transition  probabilities  (in  fact  any  stochastic 
matrix),  together  with  a  set  of  initial  probabilities,  completely 
determines  a  discrete-parameter  Markov  chain.* 

To  illustrate,  consider  the  following  example:  A  man  takes  a 
business  trip  and  leaves  his  wife  at  home.  It  is  well  established 
that  wives  consider  business  trips  only  as  riotous  interludes  to  a 
humdrum  existence  which,  in  their  position  of  servitude,  they  are 
denied.  To  mollify  her  sense  of  social  injustice,  there  Is  a  certain 
probability  that  the  wife  will  immediately  buy  a  new  hat.  Thus,  there 
are  two  possible  states:  h-j_,  in  which  a  hat  is  purchased;  and  tig,  in 
which  a  hat  is  not  purchased.  There  are  probabilities  associated  with 
each,  which  each  businessman  must  empirically  determine  for  himself. 
Thus,  the  traveler  anticipating  a  possible  outcome  of  his  trip  could 
put  these  two  states  in  a  probability  vector,  z : 


*  A  matrix  Is  stochastic  if  each  of  the  rows  totals  1,0  and  there 
are  no  negative  terms.  In  this  paper,  only  this  type  of  matrix 
is  discussed. 
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hl  n2 

a  -  [.9  .1]  . 

In  vector  z,  n^  rf’^notes  "hat",  and  h2  denotes  "no  hat," 


For  every  week  that  the  mar.  is  away  from  home*  there  in  another 
set  of  probable  events.  At  the  end  of  the  week.,  or  at  the  beginning 
of  the  second  period,  there  is  a  probability  that  the  wife,  having 
purchased  the  hat,  keeps  it  (p-jj)j  the  probability  that  the  wife,  not 
having  previously  purchased  the  hat,  does  so  (p^);  the  probability 
that  the  wife  purchased  the  hat  and  returned  it  (pl2);  and  the  prob¬ 
ability  that  the  wife  did  not  purchase  the  hat  and  therefore  did  not 
return  it  (p2o )  •  (Note  that  the  first  number  in  the  subscript  per¬ 
tains  to  the  state  in  the  first  period  and  the  second  number  to  the 
3tate  in  the  second  period.  )  These  probabilities  form  the  transition 
matrix  previously  described  --  i . e . ,  the  sum-total  of  all  possible 
outcomes.  Since  the  first  subscript  is  the  row  subscript  in  the 
matrix  notation,  the  rows  will  denote  the  states  in  the  first  time 
period  k,  and  the  columns  (the  second  subscript)  will  denote  the 
states  in  the  second  time  period.  For  example,  a  transition  matrix 
might  have  the  following  values: 

hi  h2 


,2 

,3 


If  one  multiplies  matrix  A  by  the  probability  vector  z,  the 
result  will  be  the  probability  that  the  wife  will  have  a  hat  at  the 
conclusion  of  the  first  time  period  —  l.e.,  one  week.*  Tnus, 


*  This  Is  row-column  multiplication  of  matrices.  For  those  not 
familiar  with  matrix  algebra,  a  short,  lucid  discussion  can  be 
found  in  Mood,  A,  M. ,  Introduction  to  the  Thnory  of  Statistics, 
McGraw-Hill  Book  Company,  Inc.,  Now  York,  iyi>0,  p.  Tfll 
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hl  h2 
[.9  .1] 


hl 

ho 

.8 

.2 

hl 

h; 

=  [ .y( .8)  +  . 1 ( .7 ) 

.9 (.2)  + 

.7 

.3 

hi  ho 

[.79  .21]  =  ••(1) 


The  probability  of  having  a  hat  at  the  end  of  the  fir at  week,  Ik,  in 
the  probability  that  a  hat  was  purchased  times  the  probability  that 
the  hat,  having  been  purchased,  was  kept  (.9  x  .8),  plus  the  probabil¬ 
ity  it  was  not  purchased  at  the  beginning  of  the  first  week  times  the 
probability  it  wa3  purchased  at  the  end  of  the  week  (.1  x  .7),  Thus, 
the  probability  that  the  wife  owns  a  new  hat  at  the  beginning  of  the 
second  week  is  .79. 


What  happens  as  the  second  week  passes?  If  one  assumes  that  the 
wife's  feelings  of  servitude  have  not  increased  with  the  resulting 
purchase  of  a  grand  piano,  an  estimate  of  events  may  then  be  had  by 
multiplying  the  second  vector,  times  the  transition  matrix. 


[.■79  .21] 


.8 

•  7 


.2 

.3 


hi 

=[.779 


h2 

.221] 


=  z 


(2) 


Thus,  there  is  a  smaller  probability  that  the  wife  will  own  a  new  hat 
at  the  end  of  the  second  Week  than  at  the  end  of  the  first.  (It  is 
hoped  that  the  logic  of  this  outcome  will  not  invalidate  the  example. ) 
Successive  multiplications  of  the  resulting  vectors  times  the  original 
matrix  will  produce  the  probabilities  for  succeeding  weeks  away  from 
home . 


After  a  few  multiplications  (weeks),  the  businessman  will  note 
that  the  two  values  change  very  little  between  successive  weeks  and 
soon  become  "stabilized."  This  would  indicate  that  after  u  while,  a 
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longer  stay  will  not  materially  change  the  probabilities  that  a  hat 
will  be  purchased.  For  the  third  multiplication,  z^A, 


[.779  .221] 


.8 

.2 

.7 

.3 

nl 

=  [  .7779 


.2221]  = 


,  (3) 


J  •+•  3 


can  be 


TheDe  same  vectors,  z*1),  z^11), 

generated  by  raising  the  matrix  to  successively  higher  powers  and 
multiplying  each  power  of  the  matrix  by  the  original  probability 
vector,  z,  Thus, 

[.9  .1] 


[.9 


.1] 


i2  _ 


00 

.22 

.77 

.23 

[.779  .221] 


[.9  .1] 


=  *<2) 
(3) 


AJ  =  [.7779  .2221]  =  Z' 

et  cetera. 

Also,  as  the  stochastic  matrix  Is  raised  to  higher  and  higher  powers, 
aiJ  and  akJ  aPProach  equality  for  every  1  and  k.  Each  row  will 
approach  the  same  values  as  the  probability  vector  obtained  by  multi¬ 
plying  this  power  of  the  matrix  by  the  probability  vector,'*' 


*  The  following  discussion  will  involve  only  the  above-mentioned  ma¬ 
trix  characteristics.  A  more  complete  discussion  can  be  found  in 
Kemeny,  J.  G.,  Snell,  J.  L.,  and  Thompson,  G,  L. ,  Introduction  to 
Finite  Mathematics,  Prentice-Hall,  Inc,,  1958,  Chapter  V,  -Sections 
?  and  57  On  pages  220  and  221,  there  are  two  theorems  of  interest 

(1)  If  P  is  a  regular  stochastic  matrix,  then? 

(a)  the  powers  Pn  approach  a  matrix  T, 

(b)  each  row  of  T  is  the  same  as  the  probability  vector  t, 

(c)  the  components  of  t  are  positive, 

(2 )  If  P  is  a  regular  stochastic  matrix  and  T  and  t  are  given 
by  the  previous  theorem,  then  pPn  approaches  ti  whenever  p 
is  any  probability  vector. 
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Thi a  "stable  state",  as  lb  will  be  referred  to  hereafter.  Is 

approached  as  a  limit, 

11m  z(n)A  »  11m  zAn+1  *  z^®), 
n  —  co  n  — cd 

However,  It  In  not  necessary  to  carry  out  this  limiting  process  to 
determine  r.^  00).  It  can  be  determined  as  a  vector  which,  when  multi¬ 
plied  on  the  right  by  the  transition  matrix,  will  reproduce  itself. 
This  moans  that  z(°°)  cun  be  found  by  solving  a  system  of  linear 
equations.  If  the  vector  z(°°)  is  represented  by  [a^  a2]  ,  and  the 
same  stochastic  matrix  as  in  the  previous  example  Is  used,  we  have: 

11.8  .2  11 


[a1  a ^  ] 


[<H  a2] 


.7  .3 


By  regular  matrix  multiplication,  we  obtain  the  system  of  equations, 

.Saj  ,7a2  =  a  i 

.  2  a  1  +  o®2  ~a2 


These  are  two  linear,  homogeneous  equations  with  no  unique,  non¬ 
trivial  solution.  However,  one  equation  can  be  replaced  by 

a  1  +  a  2  =  1 

which  follows  from  the  fact  that  z^®^  is  a  probability  vector.  This 
give 3  the  system 

.2  a1  -  .7  a  2  «  0 

ax  +  a2  =  1  . 

The  solution  of  these  two  linear  equations  yields  a2  *  2/9  and 
a1  «  7/9,  or  a 2  =  .2222  .  .  .  and  ax  -  .7777  .....  Note  that  these 
values  are  not  greatly  different  from  those  in  and  ,  This 
fact  indicates  very  rapid  convergence  in  this  Markov  chain. 


2.1.1  Markov  Process  and  Continuous  .Model 

This  Markov  process  Is  clearly  a  discrete  one,  consisting  of 
finite  steps  or  trials.  Such  a  model  Is  Ideal  for  describing  some 
phenomena  which  occur  in  distinct  steps  rather  than  continuously  with 
time.  Experiments  in  genetics  often  furnish  good  examples.  A  given 
mating  will  produce  an  offspring  with  probabilities  of  certain  given 
characteristics.  The  mating  of  the  offspring  —  a  finite  step  — 
will  produce  the  given  characteristics  with  another  set  of  probabil¬ 
ities.  In  the  field  of  electronics,  the  return  of  a  signal  on  each 
rotation  of  a  radar  antenna  —  the  blip/scan  ratio  —  has  been  de¬ 
scribed  by  a  Markov  process.  Here  each  turn  of  the  radar  forms  a 
distinct  "trial.1'  However,  in  the  equipment  deterioration  phenomenon 
considered  here,  the  situation  is  no  longer  discrete  hut  Is  a  contin¬ 
uous  function  of  time  and  the  values  which  appear  In  the  transition 
matrix  are  dependent  upon  the  time  interval  selected.  Consider  dete¬ 
rioration  from  state  to  state  a2.  The  value  of  pj2  would  be  con¬ 
siderably  smaller  if  deteralned  over  an  interval  of  one  hour,  than  If 
determined  over  an  Interval,  say,  of  one  month.  In  using  a  Markov 
process  In  deterioration  models  one  must  be  aware  of  this  discrete 
aspect.  This  alone  should  not  present  great  difficulties,  for  contin¬ 
uous  phenomena  have  long  been  approximated  by  discrete  methods  and  It 
is  a  natural  consequence  of  periodic  rather  than  continuous  monitoring 
of  equipment  performance.  But  It  means  that  the  selection  of  the  time 
Interval  for  developing  the  transition  matrix  must  be  made  with  some 
consideration  of  this  Inherent  discrete  characteristic. 

Attention  Is  called  to  another  aspect  of  the  Markov  process. 

The  development  of1  each  step  In  the  Markov  chain  uses  only  the  infor¬ 
mation  (the  probable  states)  which  existed  In  the  immediately 


preceding  time  period  and  no  other.  In  other  words,  the  states 
existing  prior  to  the  Immediately  preceding  one  are  not  drawn  upon 
for  Information,  nor  is  the  manner  In  which  the  preceding  state  was 
reached  used  as  contributing  information.  This  Is  stated  hy  Feller 
(p.337)  —  "Conceptually,  a  Markov  process  is  the  probabilistic  ana¬ 
logue  of  the  process  of  classical  mechanics  where  the  future  develop¬ 
ment  is  completely  determined  by  the  present  state  and  is  Independent 
of  the  way  in  which  the  present  state  has  developed.” 


If  the  deterioration  process  preceding  a  measurement  has  been 
exceedingly  rapid,  or  conversely  slow,  the  Incorporation  of  this  in¬ 
formation  should  contribute  to  a  better  prognostication  of  system 
performance .  To  the  degree  that  this  Information  Is  not  used,  the 
first-order  Markov  process  may  leave  room  for  improvement  in  actual 
application.*  Perhaps  similar  application  of  higher  order  processes 
may  correct  this,  but  this  is  not  believed  to  be  a  critical  deficiency- 
in  the  model  in  view  .of  its  intended  application.  The  procedure  to 

(C 

be  used  here  does  make  use  of  a  significant  portion  of  the  available 
lnfomation  and  later  ARINC  Research  Corporation  studies  will  examine 
^methods  by  which  the  additional  information  might  be  incorporated.. 


.2.1.2 


>llcatlon  of  the  Model  to  a  Theoretical  Problem 


The  transition  matrix  is  determined  experimentally  in  the  manner 
described  in  the  appendix.  The  probabilities  p^j  are  determined  from 
periodic  measurements  of  the  performance  of  a  number  of  equipments. 


*  The  Markov  process  described  in  the  previous  example  is  a  first- 
order  or  simple  Markov  process.  Higher  order  Markov  processes 
are  those  in  which  the  transition  probabilities  depend  on  two  or 
more  preceding  time  periods.  See  Doob,  J,  L. ,  Stochastic  Processes. 
p.89,  John  Wiley,  19S3. 


t 


M 
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At  each  measurement  period,  the  number  of  equipments  which  have  moved 
from  one  state  to  another  Is  recorded.  The  number  of  levels  of 
equipment-states  is  a  matter  of  instrumentation  and  Judgment,  More 
sensitive  instruments  will  permit  the  selection  of  more  classes  or 
states  into  which  the  variable  can  be  divided.  Generally,  the  more 
classes  there  are  in  the  transition  matrix  the  better,  since  we  are 
approximating  continuous  phenomena  by  discrete  steps.  However,  a 
large  number  of  classes  may  require  that  processing  be  done  by  machine, 
since  the  matrix  computations  are  certain  to  be  laborious.  For  pur¬ 
poses  of  explanation,  assume  only  four  classes  or  states  of  system 
performance.  Let  state  be  peak  performance,  state  a2  be  Inter¬ 
mediate  performance,  state  be  marginal  performance,  and  state 
be  failure.  A  series  of  observations  of  equipment  performance  would 
yield  the  following  matrix: 


al 

a2 

a3 

a4 

al 

Pll 

p12 

p13 

pl4 

a2 

P21 

P22 

P23 

P24 

a3 

P31 

P32 

P33 

P34 

a4 

p4l 

P42 

P43 

P44 

The  interval  selected  for  collection  of  data  must  be  sufficiently 
3hort  that  only  a  few  equipments  will  "skip"  classes  as  their  perform¬ 
ance  deteriorates.  What  this  interval  is  will  depend  upon  experimen¬ 
tation  and  experience  with  the  equipment.  The  error  resulting  from 
selection  of  too  short  an  interval  will  not  be  large.  However,  if  too 
long  an  interval  is  selected,  "second  generation"  equipments  which 
have  been  repaired  and  returned  to  higher  levels  of  operation  may 
materially  affect  the  data.  This  problem  is  covered  in  the  appendix. 
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2.1.3  Classification  of  Failures  In  the  Transition  Matrix 


In  the  preceding  discussion,  operating  state  a^  was  defined  as 
failure.  However,  equipments  which  actually  have  values  higher  than 
a4  may  be  removed  from  service  if  operating  personnel  are  dissatis¬ 
fied  with  performance.  Thus,  when  the  transition  matrix  is  prepared 
from  experimental  data,  class  a^  will  have  a  dual  meaning.  For  ex¬ 
ample,  if  an  equipment  has  measured  performance  in  class  a2,  but  is 
known  to  have  been  ordered  removed  from  service  by  the  operator  during 
the  interval,  this  equipment  is  classed  in  state  a^.  Unless  there  is 
perfect  functional  dependence  between  the  measured  characteristic  and 
the  frequency  of  failure,  state  will  Include  a  combination  of 
equipments  —  those  actually  observed  to  be  in  state  aj|  at  the  end  of 
the  interval  and  all  other  equipments  removed  for  repair,  regardless 
of  their  measured  level  of  operation. 

This  failure  classification  can  be  described  another  way. 

Suppose  that  the  transition  matrix  is  based  solely  on  the  measured 
values  of  equipment  performance,  and  the  same  four  levels  of  the 
characteristic  are  assumed.  The  matrix  can  be  written  as 


al 

a2 

a3 

a4 

al 

P11 

P12 

p13 

pl4 

a2 

P21 

P22 

p23 

p24 

a3 

P31 

p32 

P33 

p34 

a4 

0 

0 

0 

1 

With  each  level  of  performance,  there  is  a  certain  probability  of 
failure,  and  as  the  performance  level  decreases,  we  expect  an  increas¬ 
ing  probability  of  failure.  Thus  there  are  failure  probabilities 
associated  with  each  state? 


o 

•  f  ■■ 


Again  it  is  assumed  that  the  last  measured  state  unmistakably 
represents  an  equipment  failure. 

The  probability  that  an  equipment  in  state  is  not  declared  a 
failure  is  1  -  fj,  A  failure/non-f allure  matrix  can  be  written? 


1  -  fr 


1  -  f. 


Then, 

pll"(l-fl)  P12(1-f2)  P13(1_1V  Pllfl+P12f2+P13f3+Pl4 

p2l(l-fl)  p22(l-f2)  Pg^Cl-f^)  p2iri+p22:f2+p23'£,3'+p24 

AF  « 

p31(1_fl^  p32  (1-f2 )  )  P3ifi+P32f2+P33f3+P34 

0  0  0  1 

In  this  example,  pu(l  -  f  i )  Is  the  probability  that  after  one 
time  interval  the  equipment  will  remain  in  state  a^  if  failure  has 
not  occurred;  p12U  “  )  is  the  probability  that  after  one  time 

interval  the  equipment  will  have  moved  to  state  a2,  again  if  failure 
has  not  occurred;  and  so  on.  The  fourth  column  gives  the  failure 
probabilities  in  the  dual  sense.  This  type  of  transition  matrix  is 
the  one  which  is  dealt  with  in  the  remainder  of  the  paper. 


2.1.4  The  Extension  to  More  Than  One  Performance 
Characteristic  ~~ 


The  failure  probability  vector  has  two  extremes,  which  for  four 
measured  levels  are 
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In  the  first  extreme,  failure  is  functionally  dependent  upon  the 
measured  performance  characteristic;  in  the  second,  when  each  entry 
is  1/4,  the  measured  performance  characteristic  is  of  no  value  in  the 
prediction  of  failure  and  another  characteristic  should  be  sought. 
Between  these  two  extremes,  a  combination  of  characteristics  is 
suggested  —  that  is,  a  second  measured  characteristic  may  account  for 
causes  of  failure  not  covered  by  the  first  one.  It  is  possible  to 
form  a  system  of  stater  based  upon  a  combination  of  measurements  of 
the  two  characteristics  In  the  following  manner.  Suppose,  for  example, 
that  the  states  of  the  initial  characteristic  are  a1?  a2,  a^,  and 
and  that  the  states  of  the  second  characteristic  are  b^,  b2,  an(i  b3* 
This  array  of  combined  states  may  be  written  as 


Thus,  c^  represents  state  for  the  first  characteristic  and  state 
b2  for  the  second  characteristic.  It  Is  now  possible  to  treat  the 
as  a  new  variable  with  12  states,  by  use  of  the  methods  described 
herein.  The  extension  to  more  than  two  characteristics  Is  obvious. 


Numerical  Example  of  the  Model 


Consider  a  numerical  example  of  a  transition  matrix,* 


*  This  matrix  was  developed  as  the  matrix  AF  In  Section  2.1,3,  For 
convenience  it  will  be  denoted  simply  by  A  in  the  remainder  of  the 
paper. 
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This  matrix  has  the  property  that  once  an  equipment  progresses  Into 
a  lower  state  it  will  not,  in  the  interval,  go  to  a  higher  state, 
i.e.,  repair  itself.  Thus  A  is  a  triangular  matrix  in  which  the  pij 
are  all  zero  for  i>  j.  (Note  also  that  p^  =  1.0.  If  a  set  begins  a 
period  in  the  failure  state,  u^,  it  will  be  there  at  the  end  of 
period.  ) 

Usually,  one  would  expect  a  transition  matrix  to  have  non-zero 
values  below  the  main  diagonal.  However,  If  the  characteristic  being 
measured  13  really  a  deterioration  phenomenon,  the  probabilities  of 
transition  to  higher  performance  states  should  be  quite  small.  The 
m?'-?lx  should  tend  to  be  triangular  in  the  sense  that  the  values  of 
p^  should  be  veiy  small  if  i>  j.  Furthermore,  one  would  usually 
expect  pij  to  be  quite  small  if  1  Is  much  smaller  than  J.  In  summary, 
♦•his  means  that  the  performance  characteristic  selected  for  measure¬ 
ment  should  be  one  for  which  Improvement  is  rare,  deterioration  is 
common,  and  states  are  so  defined  that  equipments  do  not  commonly 
deteriorate  more  than  one  state  in  the  basic  time  Interval  of  the 
transition  matrix.  All  three  of  these  properties  are  satisfied  by 
the  numerical  example  selected.  The  first  condition  is  satisfied 
since  p21,  p^,  P41*  P42’  and  p43  are  a11  zero>  The  non-zero 

values  of  p^,  p12»  P22»  P23>  p33»  p34»  and  P-44  are  consistent  with 
the  second  property.  Finally,  the  zero  values  for  P33,  p^4*  and  P04 


1 


20 


1 


» 


reflect  the  third  property.  It  should  be  stressed,  however,  that  the 
method  can  be  used  to  develop  state  distributions  no  matter  what  the 
form  of  the  stochastic  transition  matrix. 


Prom  an  engineering  viewpoint,  it  is  important  to  realize  that 
the  assumptions  of  deterioration  and  accural  .'  instrumentation  natu¬ 
rally  lead  toward  a  triangular  matrix.  If  the  experimental  data  do 
not  reflect  this,  it  would  suggest  that  the  measured  characteristic 
is  not  a  good  one  on  which  to  base  prediction  of  failure,  or  that  the 
accuracy  of  measurement  is  too  crude  to  monitor  equipment  performance, 
or  that  both  of  these  conditions  hold.  In  this  case,  the  situation 
must  be  examined  to  see  if  another  characteristic  must  be  selected, 
or  if  instrumentation  can  be  Improved. 

We  can,  from  this  information,  generate  the  failure  density  func¬ 
tion  of  the  equipment.  If  the  equipments  all  begin  in  state  aj,  then 
the  initial  probability  vector  is; 

al  a2  a3  a4 

[1  0  0  U]  . 

If  thi3  vector  is  multiplied  by  the  transition  matrix,  the 
distribution  by  state  at  the  end  of  one  time  interval  is  obtained. 
Repetition  of  this  process  generates  the  state  distribution  over  time 
as  shown  in  Table  1.  Columns  2  through  5  give  the  state  distribution 
for  the  times  shown  in  Column  1.  This  constitutes  an  important  de¬ 
scription  of  the  equipment  deterioration  pattern,  based  entirely  on 
the  transition  matrix.  The  failures,  which  were  identified  as  state 
*4,  are  shown  in  Column  5  in  the  form  of  the  unreliability  function, 
U(t ) .  This  results  from  the  fact  that  the  process  Includes  no  repair, 
so  the  cumulative  failure  frequency  is  developed.  The  failure  density 
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'*  function,  u(t),  In  Column  6,  gives  the  probability  of  failure  1  In  * 
ea,oh  time  perlpd.  (See  Figure  2.)  It  \‘s  obtained,  by  taking  differ¬ 
ences  between  U(t)  values  of  Column  5-  The  reliability  function, 
show  in  Column  7  is  computed  from  Column  by  the  formula 

R(t)  =  1  -  U(t)  . 

The  average  level  of  measured  equipment  performance  was  basic  in 
the  statement  of  the  problem  given  earlier  in  this  report.  Such  an 
average  level  is  in  reality  an  "average  state"  as  a  function  of  time. 
In  order  to  compute  such  an  average  state  and  to  draw  a  graph  for  the 
transition  matrix,  it  is  necessary  to  identify  states  numerically  in¬ 
stead  of  just  by  names  a-j_,  a2,  a^.  Column  8  shows  average  state 

based  on  the  assignment  of  numerical  value  1  to  a-^,  2  to  a2,  etc. 
Thus,  each  entry  in  Column  8  is  the  average  state  of  the  probability 
vector  in  Columns  2  through  5  for  the  time  shown  in  Column  1. 


FIGURE  2 

T1ME-TO-FAIIURE  DENSITY  FUNCTION,  u(t) 
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F 1  guro  La  a  graph  of  the  average  state  values  in  Column  8  plotted 

.aga! nst  time?,  Column  1.  In  addition,  the  state  distribution  for 
1  '.mo  t  --  8  is  shown  in  the  upper  right  hand  comer  of  the  figure.. 
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AVERAGE  DETERIORATION  OF  A  PERFORMANCE  CHARACTERISTIC 

It  should  be  noted  that  the  method  of  selecting  a  preventive 
maintenance  criterion  is  not  dependent  on  the  assignment  of  numerical 
values  to  the  various  states.  Numerical  values  are  assigned*  here 
only  in  the  desire  to  describe  the  deterioration  phenomenon  by  means 
of  the  transition  matrix,  and  to  relate  the  computations  in  the  re¬ 
maining  portion  of  this  report  to  the  problem  stated  In  terms  of 
deterioration . 


*  In  most  cases,  the  numerical  values  would  be  given  directly  by 
the  characteristic  measurement. 


The  foregoing  discussion  is  concerned  primarily  with  the 
development  of  the  basic  concept  that  deterioration  phenomena  can 
be  represented  by  a  Markov  process.  Reference  has  been  made  to  the 
problems  associated  with  data  collection,  definition  of  suitable 
performance  levels,  and  determination  of  a  time  interval  for  the 
fundamental  transition  matrix  consistent  with  repair-time  require¬ 
ments. 


At  this  point,  it  is  necessary  to  describe  methods  whereby 
matrices  can  be  modified  to  provide  for  repair  of  in-service  failure 
at  times  consistent  with  normal  maintenance  practices,  and  also  to 
provide  for  independently  scheduled  preventive  maintenance.  It  Is 
reasonable  to  assume  that  preventive  maintenance  will  be  scheduled 
at  intervals  which  are  long  compared  to  the  time  required  for  the 
repair  of  an  in-service  failure. 

When  the  unit  of  time  Is  the  interval  covered  by  the  basic  tran¬ 
sition  matrix,  the  problem  is  to  develop  a  method  for  computation  of 
the  expected  number  of  in-service  failures  which  will  occur  In  n  time 
unit  intervals  —  with  repair  of  in-service  failures  at  the  end  of 
each  unit  Interval  and  preventive  maintenance  at  the  end  of  the  nSl 
unit  interval  —  and  to  express  the  entire  process  in  matrix  form. 

It  is  assumed  here  that  maintenance  always  occurs  at  the  end  of  each 
unit  Interval. 

For  the  sake  of  convenience,  the  numerical  matrix  shown  on 
page  20  is  repeated  here.  It  will  be  recalled  that  state  a^  con¬ 
stitutes  failure, 
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A  = 


al 

a2 

a-, 

o 

ait 

.5 

.5 

0 

0 

0 

.5 

.5 

0 

0 

0 

.1 

.9 

0 

0 

0 

1.0 

If  it  la  assumed  that  repair  of  an  in-service  failure  returns 
the  equipment  to  the  highest  operating  state*  the  basic  matrix  is 
modified  by  adding  Column  4  to  Column  1  to  form  a  new  Column  1* 
Since  such  repairs  are  made  at  the  end  of  each  time  interval*  the 
basic  transition  matrix  becomes: 


A_i  = 


.5 

0 

.9 


.5  0 

.5  .5 

0  .1 


Note  that  Column  4  and  Row  4  of  Matrix  A  have  been  deleted.  This  is 
done  here  because  the  repair  of  in-service  failures  precludes  the 
existence  of  equipments  in  state  a^,  and  therefore  a  matrix  with 
three  rows  and  three  columns  is  adequate  to  describe  the  situation.-* 


It  is  of  interest  to  note  that  this  operation  can  be  expressed 
as  a  matrix  product.  This  leaves  a  four  by  four  matrix  contain¬ 
ing  A_]_  and  indicating  the  shift  of  equipments  from  a/j.  to  aj_. 

.5 

0 
0 
0 

Thus,  A  is  multiplied  on  the  right  by  a  matrix  expressing  that 
the  probability  is  1,0,  that  state  a  3  will  remain  aj_,  that  ap 
will  remain  a^,  that  a  will  remain  a?,  and  a;.  ; .  !.i  1  become 


.5 

0 

0 

1 

0 

0 

0 

.5 

•  5 

0 

0 

.5 

•  5 

0 

0 

1 

0 

0 

0 

.5 

-5 

0 

0 

.1 

..,9 

0 

0 

1 

0 

.9 

0 

,1 

0 

0 

0 

1.0 

1 

0 

0 

0 

1,0 

0 

0 

0 

Deletion  of  one  row  and  one  column  is  indicated  bv  the  subscript  In 
the  symbol  A_-]_.  The  notation  will  be  extended  later  as  the  matrix  is 
reduced  by  deletion  of  additional  rows  and  columns. 

If  the  process  of  repairing  in-service  failures  is  continued  in¬ 
definitely,  a  stable  state  is  approached,*  as  illustrated  at  the  end 
of  Section  2.1.  If  the  stable  state  is 

z (  00  )  =  [a1  a2  a 2  ]  , 

then  z(°°)  A_i  s  j(®)  . 

This  gives  the  system  of  equations 


.5  “i 

+ 

Co 

R 

CO 

II 

al 

.5 

+ 

• 

VJl 

R 

ro 

it 

a  2 

.5  “2 

+ 

♦1  a  3  = 

a3 

a  x  + 

u2 

+  a3  = 

1. 

Therefore,  z(°°)  =  [.3913  .3913  .2174]  . 

In  the  stable-state  condition,  the  expected  number  of  in-service 
failures  in  one  time  interval  can  be  obtained  by  multiplying  the 
vector  z<®>  by  the  fourth  column  of  matrix  A: 

.5  .5  0  i  0 

i 

a2  a^  a2|  0  .3  .5  j  0 

(co)  .  ,  ,  0  0  .1  j  .9 

'  A  =  [.3913  .3913  .2174  0]  \ 

0  0  0  i  1.0 

The  fourth  element  in  the  product  is  (.9)  (.2174)  =  .1957.  Kierefore 
this  is  the  expected  frequency  of  in-service  failures  in  each  time 
interval  after  the  stable  state  has  been  reached. 

*  If  no  repairs  are  made,  the  stable  state  developed  from  matrix 
A  is  [  0  0  0  1  ]  . 


To  Illustrate  the  computational  procedure,  suppose  preventive 
maintenance  were  perfumed  at  the  end  of  every  sixth  interval.  Then 
the  total  number  of  maintenance  actions  would  be  the  ruin  of  repaix-s 
of  in-service  failures  —  performed  at  the  end  of  each  Interval  — 
and  the  required  preventive  maintenance  actions  at  the  end  of  the 
sixth  interval.  With  four  states,  aj,  ag,  and  34,  with  fceinc 
failure,  the  a^  would  be  shifted  by  repair  of  in-service  failure  to 
aj  at  the  end  of  each  interval.  Preventive  maintenance  would  consist 
of  transfer  of  a,  (or  a-  and  a2)  to  aj_  at  the  end  of  every  sixth 
time  Interval. 

These  maintenance  actions  can  be  expressed  in  matrix  notation 
in  the  manner  shown  below. 

Let 

x  =  [xx,  Xg,  X3] 

denote  the  probability  vector  for  the  initial  distribution  of  equip¬ 
ments  by  states.  The  transition  matrix  A_j_  describes  the  otate  tran¬ 
sition  in  one  time  interval  if  in-service  failures  are  repaired  but 
no  preventive  maintenance  is  performed.*  The  probability  vector  at 
the  end  of  one  time  interval  is  the  product  xA_. .  This  vector  be¬ 
comes  xa”j_  at  the  end  of  the  n—  interval.  In  apply  Inc  thx.  method 
it  is  convenient  to  compute  powers  of  A_^ .  The  powers  of  interest  in 
this  example  are  shown  below. 

*  As  indicated  in  Section  2.3,  the  notation  was  adopted  to 
denote  the  deletion  of  one  row  and  one  column  following  the 
addition  of  Column  4  to  Column  1.  If  the  preventive  maintenance 
schedule  called  for  repel?  of  equipments  in  3tate  two  columns 
would  be  added  to  Column  1,  and  the  notation  would  oe  A_2;  etc. 


.25 

.50 

.25 

.350 

.375 

.275 

All- 

.45 

.25 

.30 

a3-i- 

.495 

.350 

.155 

.54 

.45 

.01 

.279 

.4p5 

226 

.4225 

.3625 

.2150 

,40475  .39250  .20275 

»ii  - 

.3870 

.4225 

,1905 

.36495  -40475  -  23030 

.3429 

.3870 

.2701 

.41454  .36495  .22051 

.38485 

.39862 

,21652 

,.'8729  .39175  .22096 

*?i- 

.38974 

.38485 

.22540 

All  " 

.39773  .38730  .21496 

.40572 

.38974 

.20452 

.38694  .39773  .21538 

39252  .38952 

.21796 

39234  .39252 

.21514 

38726  .39234 

.22040 

0 

It  will  be  noted  that  the  rows  converge  toward  the  stable -state  vector 

-  1.3913  ,3913  .2174 1  , 

which  was  derived  by  the  Method  described  in  Sections  2.1  and  2.3. 


IT  the  preventive  maintenance  schedule  requires  the  transfer  of 
state  S3  equlpaents  to  state  a^  at  the  end  of  n  Intervals,  the  stable- 
state  vector  at  the  md  of  the  n£!l  Interval  Is  determined  by  the  seas 
methods .  for  exjuple,  If  n  -  4  {note  **l),  the  transition  Matrix  Is 


1.4225  ♦  .2150  .3625  H  II  .6375  .3625  | 
,3®7©  ♦  .1905  .4225  11  ,5775  .4225  1  , 
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which,  when  multiplied  on  the  left  by  the  stable-state  probability 
vector  [a^  ag],  reproduces  this  vector.  Thus 


.6375 

•  3625 

.5775 

.4225 

[“1  a2]  - 


This  is  equivalent  to 

.6375  B1  +  .5775  “2  =  “l 
.3625  a  ±  +  .4225  a  2  =  a£ 

Replacing  one  equation  by  ax  +  ag  =  iJ  the  solution  is  found  to  be 

g^  —  ,6l4,  Gg  —  *366* 

Therefore,  the  stable-state  vector  in  this  instance  is 


[.614  .386]  . 

This  is  ideally  an  abbreviation  for  the  vector  [  .614  *386  0  0], 

which  indicates  that  no  equipments  are  left  in  states  and  from 
one  preventive -maintenance  interval  to  another.  It  must  be  remem¬ 
bered,  however,  that  the  equipments  do  pass  through  these  states  In 
the  Intervals  between  preventive  maintenance  actions. 


Stable-state  probability  vectors  for  a  selection  of  preventive 
maintenance  schedules  are  shown  below: 

Preventive-Maintenance  Interval  (n)  Stable-State  Vector 


n 

al 

a2 

a3 

a3 

1 

1*5 

.5 

0 

0  ] 

2 

t*6 

.4 

0 

0  ] 

3 

1.634 

.3  66 

0 

0  ] 

4 

[.614 

.386 

0 

0  ] 

8 

1.6381 

.3619 

0 

e 

0  3 
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The  number  of  failures  which  occur  In  the  time  Interval  between 
preventive  maintenance  actions  is  determined  by  chaining  the  n— 
stable-state  vector,  using  multiplication  by  matrix  A  "n"  times.* 
This  procedure  is  illustrated  for  the  case  where  preventive  malnte- 


nance  occurs 

every 

fourth 

Interval. 

Period 

0]  = 

1 

[.614 

.386 

0 

0}A  =  [.3070  .  5000 

.1930 

2 

(.3070 

.5000 

.1930 

0]A  =  [.15350  .  40350 

.2693 

.1737]  =z(2) 

.1737  +  .15350  -  .3272** 

0 

[.3272 

.4035 

.2693 

OJA  -  [.16360  .36535 

.33868 

.24237]  = 

4 

I .40597 

.36535 

.22868  0] A  -  [.20299  .38566 

.20554 

.20581]  = 

The  total  number  of  in-service  failures  which  occur  during  each 
four- interval  period  between  preventive  maintenance  actions  i3 
0  +  .1737  +  .2434  +  .2058  -  .6229.  The  number  of  equipments  which 
undergo  preventive  maintenance,  those  in  state  a^  at  the  end  of  the 
fourth  period,  is  .20554. 

The  total  number  of  in-service  failures  and  preventive  mainte¬ 
nance  replacements  for  other  groups  of  intervals  are  determined  in  a 
similar  fashion.  The  chaining  occurs  a  different  number  of  times 
and  the  associated  stable  states  are  different  for  different  intervals. 
Computations  for  preventive  maintenance  every  sixth  interval  are  given 
below. 


•  See  procedure  given  on  page  27. 

**  During  the  second  interval,  .1737  failures  occurred.  Since  it 
was  hypothesized  that  these  equipeMnts  were  repaired  during  the 
Interval  (restored  to  state  a*),  there  is  no  state  a^  bag inning 
with  the  third  interval. 


n2  *  1.0017  =*  number  of  In-service  failures 
n2  *  .2200  »  number  of  preventive  replacements. 

Table  2  gives  the  number  of  ln-servlce  failures  (nj)  and  the 
number  of  equipments  replaced  during  preventive  maintenance  (n2)  for 
various  values  of  n,  the  number  of  Intervals  between  preventive  main¬ 
tenance  actions. 


TABLE  2 

NUMBER  OP  IN-SERVICE  FAILURES  AND  PREVENTIVE  REPLACEMENTS,  WHEN 
PREVENTIVE  MAINTENANCE  IS  SCHEDULED  EVERY  SIXTH  INTERVAL 

Interval 

In-Service  Failures 

Preventive  Replacements 

(r.) 

<nl) 

(n2) 

1 

0 

.25 

2 

.18 

3 

.4062 

.2311 

4 

.6219 

.2055 

5 

.8109 

-2137 

6 

1.002 

.2200 

7 

1.3842 

.2169 

2.3.2  Smeary  of  the  Mathematical  Model 


The  mathematical  model  developed  In  preceding  sections  expresses 
the  deterioration  pattern  of  an  equipment  in  the  form  of  a  Markov 


*  State  04  is  parenthesised  because  these  values  have  been  added  to 
state  a}. 


* 

•  * 

.  » 

s  a 

z(0)  = 

[ .6067 

•  3932 

0 

0] 

z(D  = 

C  .3034 

.5000 

.1966 

0] 

z(,£0  = 

f  .3286 

.4017 

.2697 

(.1769)*] 

z(3)  = 

[  .407  0 

.3652 

.2278 

(.2427)] 

zW  - 

[ .4085 

.3861 

.2054 

(.2050)] 

z(5)  = 

[  .3891 

.3973 

.2136 

(.1849)] 

z(6)  = 

[  .3868 

.3932 

.2200 

(.1922)] 
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process.  This  transition  matrix  covers  an  interval  of  time  which  13 
sufficiently  short  to  preclude  the  influence  of  second-generation 
equipments  —  that  is,  equipments  which  are  repaired  and  returned  to 
service  before  the  end  of  the  interval.*  The  distribution  of  equip¬ 
ments  by  performance  level  or  3tate  at  any  subsequent  time  is  ex¬ 
pressed  as  the  product  of  the  initial  state  distribution  and  an 
appropriate  power  of  the  transition  matrix.  This  power  is  equal  to 
time,  expressed  in  units  of  the  interval  for  which  the  transition 
matrix  is  applicable. 

Maintenance  procedures  can  be  expressed  as  modifications  of  the 
transition  matrix.  Repair  of  in-service  failures  is  reflected  by 
the  addition  of  the  failure-state  column  to  the  column  representing 
the  state  following  repair.  Preventive  maintenance  is  reflected  by 
the  addition  of  lower-state  columns  to  higher-state  columns  as 
appropriate.  In  the  present  discussion,  It  Is  always  assumed  that 
preventive  maintenance  and  repair  of  in-service  failures  restore  the 
equipment  to  the  highest  state,  which  Is  represented  by  the  first 
column  of  the  matrix. 

If  ln-servloe  failures  are  repaired  immediately  after  occurrence 
(it  Is  Implicitly  aaaumed  that  they  will  be  repaired  within  the  thee 
period  covered  by  the  transition  matrix)  and  equipments  In  certain 
states  lower  than  state  a^  are  repaired  at  the  end  of  every  n£&L  In¬ 
terval,  a  stable  state  Is  developed  around  this  replacement  pattern. 


•  The  selection  of  appropriate  intervals  Is  discussed  In  Section  1 
of  the  Appendix. 
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Each  replacement  pattern  will  generate  a  different  number  of 
failures  since  each  will  have  its  own  stable  state-  The  arithmetic 
Is  expressed  in  terms  of  matrix  multiplication.  By  computing  several 
replacement  patterns  which  differ  both  as  to  time  interval  and  re¬ 
placement  level,  it  is  possible  to  compare  the  total  number  of  in- 
service  failures  and  preventive  replacements  generated  by  each  main¬ 
tenance  pattern.  This  comparison  provides  the  data  required  to  ue- 
temine  which  pattern  of  maintenance  yields  the  lowest  cost.  All  cost 
computations  can  be  based  on  3table-state  distributions,  since  the 
ultimate  average  cost  is  Independent  of  the  initial  state  distribution. 


DETERMINATION  OP  OPTIMUM  MAINTENANCE  SCHEDULES 


3.1  The  Cost  Equation 


The  expected  average  cost  per  unit  time  is  given  by  the  equations 


C  =  — +  k^2  +  IC3]  • 


where: 


h  *  the  number  of  time  units  between  periodic  preventive 

maintenance  actions;  one  time  unit  Is  the  period  of  time 
for  which  the  transition  matrix  is  developed. 

k^  »  the  cost  of  repair  of  an  in-service  failure. 

k2  •  the  cost  of  a  scheduled  preventive  maintenance  action. 

ko  *  the  cost  of  periodic  test  or  measurement  of  performance 
J  level. 

n^  -  the  expected  number  of  in-service  failures  in  h  time  units. 

n2  -  the  expected  number  of  scheduled  preventive  maintenance 
actions  in  h  time  units. 


The  situations  of  intereat  are  those  in  which  the  cost  of  repair¬ 
ing  an  in-service  failure  ia  considerably  greater  than  the  coat  of  a 
preventive  maintenance  action  at  scheduled  maintenance  intervals 
(there  would  be  little  reason  for  preventive  maintenance  if  it  were 
more  expensive  than  repair  of  an  in-service  failure).  Therefore*  in 
the  numerical  illustrations  which  follow,  it  is  assumed  that  the 
values  of  k^  are  considerably  larger  than  the  values  of  l^.  Zt  is 
also  assumed  that  the  values  of  k3  (the  coat  of  making  the  periodic 
check  of  equipment  performance)  are  leas  than  either  of  the  other 
costs,  although  this  is  not  a  necasaary  assumption  and  has  no  effeot 
on  the  validity  of  the  method. 


3.2  A  Pour-State  Example 


To  illustrate  the  method,  cost  computations  for  the  above 
numerical  example  are  made  Tor  a  selection  of  cost  parameters. 

Assume  two  set3  of  values: 

(1)  k3  =1,  k2  =  4,  kx  =  8 

(2)  k3  =1,  k2  =  4,  kx  =  16. 

If  no  preventive  maintenance  is  performed  —  that  is,  if  equip¬ 
ment  is  repaired  only  after  failure  —  there  will  be  a  constant  fail¬ 
ure  rate  of  .1957  per  time  interval  (see  page  27),  and  the  cost 
equation  is 


TABLE  3 

AVERAGE  COST  PER  UNIT  TIME  WHEN  PREVENTIVE  MAINTENANCE  IS  PERFORMED 

EVERY  nlil  INTERVAL  BY  REPLACING  EQUIPMENTS  IN  STATE  a3 

Replacement 

C03t 

Cost  Schedule  1* 

Cost  Schedule  2* 

Interval 

Equation 

kx  -C 

k^  «  16 

1 

Y  fokj,  +  .25  leg  +  k3) 

2.00 

2.00 

2 

\  1 .lElq  +  ,27kg  +  k3) 

1.76 

2.48 

3 

i  [,4062k!  +  ,2311kg  +  k3) 

1.72 

2.81 

4 

Y  [.6219kx  ♦  .2055k£  +  k3] 

1.70 

2.94 

5 

//  f  t  .8110k!  ♦  ,2136k£  +  k3) 

i.ec 

2.97 

6 

^  1 

J  ( 1.002k!  «-  ,2200kg  +  k3) 

1.65 

2.98 

8 

|  [1.384k!  +  .  2169kg  +  k3l 

1.62 

3.00 

*  In  both 

cost  schedules,  kg  »  4  and  k-  «  1 

• 

c  = 


[.1957  k2  +  Ok 2  +  Ok3]  =  1.56,  if  k2  =  8 


=  3.13,  if  k2  =  16. 

Assume  preventive  maintenance  is  performed  every  nill  interval  by 
restoring  equipments  in  state  a-.-  to  state  a2.  Then  the  resulting  cost 
equations  and  total  costs  are  those  shown  in  Table  3.  The  costs 
listed  in  the  table  are  plotted  in  Figure  4.  These  curves  lead  to 
the  following  observations.  When  Schedule  1  costs  are  assumed,  pre¬ 
ventive  maintenance  at  any  time  interval  is  more  costly  than  none  at 
all  (the  cost  of  repair  of  in-service  failures  with  no  preventive 
maintenance  is  indicated  by  the  horizontal  line  marked  k2  =  B).  On 
the  other  hand,  when  Schedule  2  co3ts  are  assumed,  any  preventive 
maintenance  is  better  than  none,  irrespective  of  time  —  the  optimum 
situation  being  obtained  when  preventive  maintenance  is  performed  at 
the  end  of  the  first  measurement  Interval. 
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By  a  similar  procedure,  the  relative  coats  of  other  patterns  of 
preventive  maintenance  —  for  example,  replacement  of  equipments  in 
states  and  ag  —  can  be  determined.  Such  a  comparison  would  be 
necessary  for  a  complete  cost  analysis.  However,  in  view  of  the 
small  number  of  performance  levels  involved  here,  further  study  of 
this  simple  example  is  unwarranted. 

Intuitively,  ora  would  expect  the  optimum  time  for  replacement 
to  occur  somewhere  between  the  first  and  the  infinite  intervals;  i.e_, 
there  would  seem  to  be  a  few  actual  situations  in  which  maintenance 
"as  early  as  possible"  or  "not  at  all"  would  be  warranted.  In  fact, 
the  small  simple  matrix  used  in  the  preceding  example  would  fit  few 
actual  situations.  Because  of  the  values  selected  and  the  small  num¬ 
ber  of  classes  used,  a  stable  state  is  reached  very  quickly,  which 
forces  second-  and  third-generation  failures  to  enter  rapidly  into 
the  average  failure  rate.  Table  2  (see  page  32)  indicates  that  fail¬ 
ures  begin  in  the  second  interval,  that  nearly  two-thirds  of  them 
have  occurred  by  the  end  of  the  fourth  interval,  and  that  new- 
generation  failures  have  occurred  by  the  end  of  the  sixth  Interval. 
Thus,  the  equipments  rapidly  reach  a  random-age  distribution,  and 
the  failure  density  function  for  the  second  generation  30  overlaps 
the  first-generation  density  that  the  cost  curves  (which  reflect 
system  failures)  are  quite  smooth.* 


*  Welker,  Dr.  E.  L. ,  Relationship  Between  Equipment  Reliability, 

Preventive  Maintenance  policy,  and  Operating  Costs,"  AHlWfl  Research 
fcorpomtion,  February  13.  I9b9  (Publication  Ho.  lol-9-15f ) , 
pp.  20  ff. 
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t  should  also  be  noted  that  the  curves  shown  In  Figure  4  begin 
at  the  first  time  interval.  All  cost  curves  start  with  this  interval, 
as  cost  equations  and  number  of  failures  can  be  determined  only  for 
intervals  which  are  at  least  as  long  as  the  one  selected  for  the 
transition  matrix.  If  the  basic  time  interval  in  this  example  were 
shortened,  the  beginning  value  for  both  curves  in  the  figure  would  be 
considerably  higher,  because  the  cost  of  nearly  continuous  checking 
(k-j)  would  be  much  higher  over  any  given  interval  of  time. 

3.3  A  Seven-State  Example 

To  illustrate  a  more  typical  case,  another  example  is  given.  In 
this  example,  the  underlying  density  function  has  a  smaller  coeffi¬ 
cient  of  variation,  a  property  which  will  turn  out  to  be  critical  in 
developing  a  more  usual  deterioration  pattern. 
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.5 

0 

0 

0 

0 
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1 

0 

.1 

.9 

0 

0 

0 

i  0 

0 

0 
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It  will  be  noted  that  the  subset Hx  shown  in  the  lower  right- 
hand  comer  is  identicsl  to  the  one  used  in  previous  illustrations. 
Appending  the  first  three  rows  and  coluans  has  the  effect  of  shift¬ 
ing  the  failure  density  to  the  right  by  three  time  intervals  (see 
Figure  3).  Therefore,  the  seen  of  the  distribution  is  now  8.1  tiae 
units,  but  the  standard  deviation  of  2.0  is  unchanged  from  the  original 
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‘example.*  The  net  effect  of  this  shift  is  to  decrease  the  coefficient 
of  variation  from  .40  to  .25.  The  additional  columns  can  be  regarded 
as  additional  states.  State  a .j  now  designates  failures  which,  when 
repaired,  are  returned  to  state  a,. 

3.3.I  "Dummy"  Columns  in  the  Transition  Matrix 

In  actual  practice,  it  may  be  necessary  to  add  "dummy"  columns 
to  the  transition  matrix  derived  from  empirical  data-.  Whether  or  not 
this  is  done  depends  upon  the  testing  instruments  used  in  the  experi¬ 
ment.  For  example,  if  these  are  sufficiently  sensitive  to  measure 
seven  rather  than  four  classes  of  performance,  there  may  be  values 
ocher  than  zero  or  one  in  the  first  three  columns.  On  the  other  hand, 
if  the  equipment  is  a  receiver  with  a  considerable  number  of  redundant 
elements,  it  will  probably  register  performance  close  to  peak  levels— 
state  —  for  a  long  period  of  time  and  then  fall  quite  rapldly- 
In  this  situation,  the  average  operating  level  would  remain  almost 
constant  and  then  drop  off  sharply,  a3  shown  in  Figure  5. 
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Actually,  under  the  hypothesis  that  failure  is  a  gradual 
deterioration  phenomenon,  it  can  be  assumed  that  the  deterioration 
shown  in  Figure  5  began  at  time  tQ  but  that  the  Instrument  was  unable 
to  measure  it.  Thus,  dummy  columns  would  be  required  to  shift  the 
failure  density  function  to  the  right,  as  in  the  case  of  matrix  B, 

The  portion  of  the  curve  to  the  right  of  is  analogous  to  the  entire 
curve  shown  in  Figure  2.  This  matter  is  discussed  in  more  detail  in 
Section  2  of  the  Appendix. 

3.3.2  Cost  Equations  for  Matrix  B 

The  computations  associated  with  matrix  B’  are  carried  out  in  the 
same  manner  as  those  of  the  previous  example.  Addition  of  Column  7 
to  Column  1  constitutes  repair  of  in-service  failures,  while  addition 
of  any  other  columns  to  Column  1  constitutes  preventive  maintenance. 
Cost  equations  for  preventive  maintenance  at  intervals  up  to  10  are 
shown  in  Tables  4  and  5  —  Group  I  equations  for  maintenance  involv¬ 
ing  replacement  of  equipments  in  state  a g,  and  Group  II  equations  for 
maintenance  involving  replacement  of  equipments  in  states  a^  and  a$. 

The  following  seta  of  costs  are  used  in  the  tables. 


Cost 

Schedule  1 

Schedule  2 

Schedule  3 

Schedule  4 

kl 

e 

11 

15 

19 

k2 

4 

4 

4 

4 

*3 

1 

1 

1 

1 

The  cost  equation  and  maintenance  coete  for  repair  of  in-service 
failures  only  (no  preventive  maintenance  actions)  are: 


TABLE  4 

GROUP  I  COST  EQUATIONS:  AVERAGE  COST  PER  UNIT  TIME  WHEN  PREVENTIVE 
MAINTENANCE  IS  PERFORMED  EVERT  iiH  INTERVAL  BY  REPLACING 
EQUIPMENTS  IN  STATE  a,- 


He;  lacement 
Interval 


Coat 

Equation 


Coat  3-hedulea* 


i  (0^  +  .14;^  +  lc33 
I  [[.13%  +  .13%  +  k3) 
^  [.37.%  +  .13%  +  %] 

i  [.Ou%  +  .l,£,i:2  +  lt.J 
i.  [.Vt>Vki  +  .  1'  ok__  +  ki] 

f0  [[1.1%  +  .1.-%  +  k.) 


*  In  each  of  tLe  tl.roo  .oat  a  .-he  an  leu. 


GROUP  II  COST  EQUATION:  AVERAGE  COST  PER  UNIT  TIME  WHEN  PREVENTIVE  MAINTENANCE 
IS  PEtfORMED  EVERY  till  INTERVAL  BY  REPLACING  EQUIPMENTS  IN  STATES  a,  AND  a  • 


Replacement 

Interval 


Coot 

Equation 


Coot  S.i.eJuloa* 


*1  -  8  ]  ki  =  11  kj  -  It, 


W 


i  [Oki  +  .2kg  +  k3J  l.£o 

|  [Oki  +  .3750kg  +  k3]  1.25 

|  [.158k!  +  ,507kg  4  k3]  1.07 

|  [ .332k!  4  .565kg  +  k3]  .906 

^  [ .789ki  ♦  .277kg  4  k3  J  1.05 

.987k!  4  .  356k,>  4  k3  ]  1.03 


In  «arti  of  tho  four  cost  sehodulos,  k2  -  4  and  -  1 
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C  -  [. 1233k!  4  oko  +  0k3 ] 

Schedule  1  =  0.9 
Schedule  2  =  1.36 
Schedule  3  =  1.85 
Schedule  4  =  2.34 

The  curves  in  Figures  6  and  7  show  the  effect  of  different  cost 
schedules  and  replacement  times  on  the  average  hourly  cost  of  replac¬ 
ing,  respectively,  state  ag  equipments  only  and  state  a^  and  state  a^ 
equipments.  Figure  6  indicates  that  preventive  maintenance  is  con¬ 
sistently  less  costly  than  repair  of  in-service  failure  when  the 
latter  has  values  of  kj  =  15  and  kx  =  19  (see  two  top  curves).  When 
repairs  of  in-service  failures  have  values  of  kj  =  11,  preventive 
maintenance  is  less  costly  only  when  performed  somewhere  between  the 
sixth  and  the  tenth  intervals,  the  optimum  time  being  at  the  end  of 
the  eighth  interval. 

In  Figure  7»  showing  the  variation  in  costs  of  replacing  equip¬ 
ments  in  states  ag  and  a^,  an  extra  curve  based  on  schedule  1  costs 
is  included.  When  this  schedule  is  assumed,  the  minimum  cost  occurs 
when  preventive  maintenance  la  performed  at  the  end  of  the  sixth  in¬ 
terval;  however,  at  this  point,  preventive  maintenance  Is  still 
slightly  more  expensive  than  no  preventive  maintenance.  For  k^  =  n, 
the  optimum  point  occurs  at  the  end  of  the  sixth  interval,  and  there 
Is  a  distinct  advantage  In  preventive  maintenance  except  when  per¬ 
formed  at  the  end  of  the  first  interval,  A  similar  advantage  la 
gained  by  performing  preventive  maintenance  when  kj_  •>  and  -  19, 
but  the  optimum  time  for  replacement  Is  now  the  end  of  the  second 
rather  than  the  end  of  the  sixth  Interval. 


*3 


Avaucc  cost  m  umt  time 


4 


AJt-OA/J 


COST  OF  REPAIR  OF  IN-SERVICE  FAILURE  vs.  COST  OF 
SCHEDULED  PREVENTIVE  MAINTENANCE,  WHEN  EQUIPMENTS 
IN  STATE  o»  ARE  REPLACED  AT  VARIOUS  INTERVALS 
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FIGURE  7 

COST  OF  REPAIR  OF  IN-SERVICE  FAILURE  vs.  COST  OF 
SCHEDULED  PREVENTIVE  MAINTENANCE,  WHEN  EQUIPMENTS 
N  STATES  04  AND  a>  ARE  REPLACED  AT  VARIOUS  R4TERVALS 


APPENDIX 

1.  Selection  of  Ma  'ntenanee  Interval  and  Time  Fc-riou  f^r  the 
Trans !•  l_n  Matrix 

The  Markov  process  requires  that  the  transition  matrix  be 
raised  to  peters  higher  than  one  in  order  to  determine  the  probable 
states  of  the  equipment  after,  say,  n  time  Intervals.  The  method 
assumes  that  if  an  equipment  fails  in  service,  it  will  not  be  re¬ 
turned  to  service  until  the  beginning  of  the  next  time  interval. 
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In  tin-  example  presented  on  p-n-e  PO,  .'tat  or.  a^,  a;.,  a-.:  and  .14 
wove  uv.tM  for  illustrative  purposes,  with  no  discussion  of  how  they 
would  he  chosen  in  an  application  of  thin  method.  In  practice,  the 
raw  data  would  bent  be  recorded  an  direct  numerical  ob nervations, 
especially  if  the  instrumentation  were  sensitive  or  the  measurement 
scale  were  lone.  Then  the  definition  of  states  would  be  the  prac- 
'  1  ”il  problem  of  data  grouping . 

Assume  that  the  curve  in  Figure  8  represents  the  average  change 
in  performance  with  time,  and  that  t0  and  tn  denote,  respectively, 
the  bey  inn. in,’,  and  end  of  the  period  of  observation.  The  following 
method  is  one  which  might  be  used  to  select  state  boundaries.  The 
time  interval,  t0  to  tn,  may  be  divided  into  an  arbitrary  number  of 
different  groupings  which  are  equal  to  the  number  of  states  to  be 
used  In  ’he  transition  matrix.  The  selection  of  this  number  is 
entirely  one  of  judgment.  For  example,  assume  that  eight  is  the 
selected  number  of  states,  and  divide  the  time  axis  into  eight 
uniform  divisions.  Vertical  and  horizontal  lines  drawn  from  these 
divisions  to  Intersect  the  curve,  as  indicated  in  Figure  8,  will 
divide  the  vertical  scale  into  as  many  different  classes  as  there 
are  divisions  of  the  time  axis.  The  probability  that  an  equipment 
will  be  In  any  one  of  these  groupings  is  computed  by  the  method 
Illustrated.  These  groupings  will  not  produce  uniform,  numerical 
divisions  on  4_bc  ordinate  scale  unless  the  average  curve  is  linear. 
Nevertheless,  the  divisions  will  be  "uniform"  insofar  as  they 
represent  an  average  range  of  performance  covered  by  the  equipment 
In  the  uniform  time  period  selected  on  the  time  axis. 
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FIGURE  8 

SELECTION  OF  STATE  BOUNDARIES  FROM  OBSERVED 
VALUES  OF  A  PERFORMANCE  CHARACTERISTIC 


There  is  one  further  consideration .  The  average  curve  may  he 
nearly  horizontal  for  low  values  of  t,  as  shown  in  Figure  5  —  for 
example,  when  parallel  redundancy  exists .  In  this  Instance,  the 
performance  measurement  will  tell  nothing  of  the  probable  relation¬ 
ship  of  the  equipment  to  the  time  axl3  in  this  interval.  However, 
the  transition  matrix  can  reflect  the  fact  that  the  equipment  is 
moving  uniformly  through  this  time  interval  or  through  the  classes 
below  the  horizontal  portion  of  the  curve.  Such  a  matrix  would  have 
zero's  and  one's  in  the  columns  which  pertain  to  these  classes  as  in 
matrix  B,  page  39.  It  should  be  noted  that  the  performance  charac¬ 
teristic  measured  will  usually  be  an  operating  characteristic  corre¬ 
lated  with,  but  not  Identical  to,  the  part  deterioration  character¬ 
istic  itself. 
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II'  the  forego Lng  procedure  should  result  in  the  Inclusion  of 
certain  "unnecessary "  states,  these  should  be  eliminated.  i*’ur 
example,  if  the  transition  matrix  should  indicate  very  low  probabil- 
'  .es  that  an  equipment  will  be  in  state  a y  at  the  beginning  of  the 
interval,  it  is  proper  to  say  that  state  aJc  is  unimportant  and  can 
be  combined  with  state  aJ._1  or  state  a^+1. 


Method  for  Computation  of  j  of  Transition  Matrix  from 


Empirical  Data 


To  illustrate  the  method  for  computing  the  p^j  of  the  transition 
matrix  from  empirical  data,  assume  four  equipments  and  four  levels  of 
performance.  They  are  observed  for  a  maximum  of  eight  time  intervals 
The  data  —  which  could  be  derived  just  as  well  from  observations  on 
one  equipment  which  was  repaired  or  returned  to  operating  level  a-j_ 
four  times  —  are  listed  in  Table  6.  The  point  to  stress  is  that 
each  line  represents  the  deterioration  of  one  equipment  from  a  higher 
operating  state  to  failure,  state  a/+.  if  the  equipment  fails  and  is 
restored  to  a  higher  state  through  repair,  another  line  must  be  added 
to  the  table. 


In  recording  data  for  use  in  developing  a  transition  matrix,  it 
is  not  sufficient  simply  to  list  all  equipments  which  are  in  a  given 
state  at  the  end  of  any  given  time  period,  for  this  information  does 
not  indicate  the  states  which  the  equipments  were  in  before  progress¬ 
ing  to  the  observed  states.  The  observed  state  and  the  immediately 
preceding  state  must  both  be  recorded  for  each  equipment  at  the  end 
of  each  interval,  so  that  the  matrix  will  indicate  the  transition 
from  one  state  to  another  within  the  interval. 
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TABLE  6 

FREQUENCY  OF  PASSAGE  FROM  ONE  GIVEN  STATE  TO  AH OTHER  IN  A 
Q1VEN  TIKE  INTERVAL,  FOR  FOUR  EQUIPMENTS 

Equipment 

Number 

State 

Part  1 

In  Each  Interval 

Part  II 

Frequency  of  Pap3ape  from  State 
in  One  interval 

to  State 

*3 

U 

1 

a 

3  4  5 

6  7  8 

ail 

a12  *13  a14  a22 

a23 

BfcJl 

a  vq 

(i) 

(2)  (3)  (4)  (5) 

(6) 

(7)  (8) 

(a) 

3 

a 

1 

1 

°i 

ai 

°i 

Ri  a2 

°t  °j  a-l 

3 

1  2 

1 

1 

i 

4 

i 

2 

U1 

ai 

ai 

a-  “>  aJ 

ai  ai  a4 

2 

1 

4 

1 

„  3 

i 

3 

ai 

ui 

ai 

“1  H  !S 

4 

1 

i 

i 

4 

a2 

u , 

ak 

r,S  \  ai| 

6 

1 

1 

1 

a  1 

Total 

9 

111, 

5 

3 

Part  I  of  the  table  lists  the  states  observed  for  each  of  the 
equipments  in  each  of  the  eight  intervals.  The  values  at  time  0  are 
beginning  values,  those  under  "1"  are  values  at  the  end  of  the  first 
interval,  and  so  on.  Thus,  Equipment  No.  3  was  observed  to  be  in 
state  a^  for  four  intervals,  and  it  failed  during  the  fifth  interval. 
Part  II  indicates  the  number  of  times  the  equipments  passed  from  one 
given  state  to  another  in  one  interval.  The  first  four  columns  give 
the  frequency  of  passage  from  state  a^  to  lower  states;  the  next 
three  columns  give  the  frequency  of  passage  from  state  ag  to  lower 
states;  and  the  last  two  columns  give  the  frequency  of  passage  from 
state  ag  to  failure,  84 •  The  p^j  are  computed  from  these  frequencies. 

The  following  tabulation  and  estimated  stochastic  matrix  are 


determined  from  the 

values 

in  Table  6. 

al 

a2 

a3 

a4 

I 

al 

9 

1 

1 

1 

12 

a2 

7 

2 

0 

9 

a3 

5 

3 

S 

>  ’ 


/  o. 


t  M  I  M 


'  *  4 


V  4 

*al 

a0 

a  , 

l> 

a4 

al 

0-  if 

1/12 

1/12 

1/12 

a2 

7/9 

2/9 

0 

a3 

5/8 

3/8 

a4 

(1) 

4.  Failure  Density  and 

Reliability 

Fune 

tiuns 

The  unreliability  function  is  developed  by  matrix  multiplication 
if  one  of  the  states  is  defined  as  failure.  In  the  notation  useu 
throughout  this  paper,  the  state  in  the  right-hand  position  in  the 
matrix  has  been  so  defined.  (In  the  examples,  this  has  been  state 
24  or  state  ar,.)  The  density  function  is  obtained  by  computinG  the 
differences  between  successive  values  of  the  unreliability  function. 

As  an  example,  take  the  four-state  transition  matrix  given  previously, 

.5  .5  0  0 

0  .5  .5  0 

0  0  .1  .9 

0  0  0  1 

The  distribution  by  state,  the  values  of  the  unreliability  function, 
and  the  values  of  the  density  function  for  integral  values  of  time 
from  t  *  0  to  t  *  11  are  shown  in  Table  7 >  together  with  the  density 
value  for  t>ll.  Table  7  shows  unrounded  values  in  all  cases. 

Before  illustrating  how  the  density  function  u(t)  can  be  used 
to  compute  the  mean  and  variance,  it  is  of  Interest  to  show  how  the 
mean  time-to-failure  is  computed  directly  from  the  transition  matrix. 
Denote  by  z^  the  mean  time-to-failure  of  an  equipment,  given  that 
the  performance  level  is  in  state  i.  Then  z1  is  the  mean  time-to- 
failure  of  a  new  equipment  —  i.e.,  an  equipment  with  performance 


to 
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*  In.  the  h Lghost  level,  state  u-^.  Since  ^tete^ajj  denotes  failure  in  j.- 
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thin  example,  zj,  =  0. 

The  moan  tlmeo-to-failure  satisfy  the  following  system  of 
linear  equations: 

Z1  =  *5  +  1)  +  .5  (z2  +  !) 

z2  =  .5  ("2  +  1)  +  .5  (S3  +  1) 

s3  =  .1  (s3  +  1)  +  .9. 

The  solution  of  this  system  of  equations  is 

S3  =  10/9,  z,  =  28/9,  and  z1  =  46/9. 

The  justification  of  each  of  the  foregoing  equations  is  depend¬ 
ent  upon  an  argument  of  the  following  type.  Consider  the  first  equa¬ 
tion  of  the  system  as  an  example.  If  an  equipment  is  in  state  a-^ 
initially,  its  mean  life  is  z^,  the  left  side  of  the  equation.  The 
right  side  of  the  equation  expresses  this  mean  life  as  a  two-step 
evaluation:  the  transition  in  one  time  interval  and  the  expected 
mean  life  thereafter.  If  an  equipment  is  in  state  a^,  the  best 
estimate  of  Its  expected  life  remains  z^  in  the  absence  of  other 
information  regarding  this  variable.  Thus,  the  .5  of  the  equipments 
that  remain  in  state  a-^  at  the  end  of  the  time  Interval  have  an 
expected  life  of  to  be  added  to  the  one  time  interval  previously 
survived.  The  .5  of  the  equipments  which  deteriorate  to  state  ag 
have,  at  the  end  of  the  time  interval,  an  additional  expected  life  of 
Zg.  The  sum  of  these  two  gives  the  right  side  of  the  first  equation 
of  the  set.  The  second  and  third  equations  are  justified  in  the 
same  manner.  It  should  be  noted  that  the  third  equation  simplifies, 
since  24  =  0. 
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When  the  mean  and  variance  are  computed  directly  from  the 
density  function  (which  here  has  the  form  of  an  open-end  distribution) 
It  Is  necessary,  or  at  least  advisable,  to  include  the  failures  which 
occur  after  the  11  time  intervals  shown  in  the  computation.  This  can 
be  done  by  selecting  an  approximate  length  of  life  for  these  failures 
Which  will  yield  the  correct  mean  f-^r  the  failure-density  function, 
u(t),  as  compute^  above,  =  46/9.  In  the  example,  this  turns  out 
to  be  t  =  13.2.  Thus,  it  is  assumed  as  an  approximation  that  at 
t  =  13.2,  there  were  .01181030275  failures.  It  is  now  possible  to 
use  ordinary  formulas  to  compute  the  mean  life  (5.111)  and  the  vari¬ 
ance  (4.0972). 

It  should  be  noted  that  the  preceding  computation  la  based  on 
the  assumption  that  failures  occur  exactly  at  the  end  of  the  tlae 
interval.  An  alternative  assumption  is  that  failures  occur  at  the 
midpoint.  Using  this  assumption,  the  mean  would  be  reduced  to  4,6ll 
and  the  variance  would  be  unchanged. 
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